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Students’ performance on standardized tests is clearly predictive of their later outcomes 
(Goldhaber & Ozek, 2019) but whether the costs of administering tests are justified by the value 
of the tests for improving students’ outcomes is controversial. This controversy fuels heated 
debates over federal testing requirements, such as those instituted under No Child Left Behind 
(NCLB) and continued under the Every Student Succeeds Act (ESSA). All ESSA-required tests 
were canceled in the 2019-20 school year due to COVID-19, and there was significant political 
debate about whether they ought to be required in 2020-21. Ultimately the U.S. Department of 
Education (USDOE) allowed for some testing flexibility in the form of ESSA waivers. 

Below, we outline some of the common ways states might use tests to improve student 
outcomes and the implications of requested and approved waivers on these uses. In particular, 
we highlight features of waiver requests that are especially important if tests are to be used by 
specific actors (e.g., families) to benefit students in specific ways. We conclude with a 
discussion of how testing policy needs to be designed if statewide tests are going to both be 
useful and maintain political support. 


Uses of Administered Standardized Tests 


Beliefs about how testing can improve education typically fall into at least one of three 
categories. First, tests can be diagnostic tools to target educational supports. This includes 
common student-level interventions, such as targeted tutoring or decisions about advanced 
coursework (e.g., Kraft, 2021; McEachin et al., 2020). Tests are also sometimes used to direct 
supports to schools, as with NCLB’s requirement that School Improvement Funds and technical 
assistance be directed to schools identified as needing support based on test scores (McClure, 
2005). Second, test results can be useful for research and evaluation. Test scores are widely used 
as a primary outcome variable in work assessing the efficacy of educational interventions, 
practices, and policies, or to assess achievement gaps between groups of students. 

Finally, tests, as required by both NCLB and ESSA, are a prominent part of school 
accountability systems (O’Keefe et al., 2021). The idea here is that rewards or sanctions 
connected to test results drive improvements to school or teacher practices.’ For example, 
statewide standardized tests may be used to evaluate teachers based on their contributions to 
students’ test scores and publicly reporting schools’ test performance may prompt families to put 
pressure on school administrators (e.g., by exercising school choice). Similarly, federal School 
Improvement Grants fund various types of whole school reforms that are targeted based on 
student test performance (USDOE, 2010). 

More broadly, tests are used to assess where states (and the country as a whole) are in 
terms of achievement, and in serving student subgroups, to inform policy and practice. For 
instance, there is significant concern about the degree to which the COVID-19 pandemic has set 
back educational achievement, and much of the early empirical evidence on that is based on test 
achievement (e.g., Caprariello, 2021). Assessments have long been used to ascertain the degree 
to which the nation is addressing the significant gaps in achievement that exist between students 
based on race/ethnicity and/or poverty (Goldhaber et al., 2018). While not a direct use of test 
scores, they may be quite important in focusing attention and resources on the achievement gaps 
that exist in schools (Hess, 2011). 

In this policy brief, we focus on some of the ways that tests might improve education. We 
emphasize might because, as we describe below, there is considerable debate and controversy 
about the usefulness of standardized tests, and test utility depends very much on how such 
policies are designed, who is supposed to use the results (e.g., teachers, families, or 
policymakers), and what the results are supposed to be used for. While not exhaustive, in Table 1 
we illustrate many of the most common combinations of purpose and user imagined by testing 
advocates.” Specific uses of testing may not fit unambiguously into a single cell of Table 1, but 
the table offers a framework for thinking about how standardized tests might be used, and about 
how ESSA waivers might limit such uses. 

Importantly, while Table 1 outlines potential uses and users of test results, it is not clear 
that this potential is being fully realized or that the benefits exceed the downsides of test 
administration. For example, as we argue below, state test results are often not disaggregated in 
ways that identify individual student skills or needs such that they provide actionable and timely 


' For evidence that test-based accountability can improve educational outcomes, see, for instance, Dee and Jacob 
(2011), Reback et al. (2014), and Wong et al. (2015). 
? For a more expansive discussion of the various uses of tests, see Ho (2014). 


information.? And testing critics argue that test-based accountability leads to narrowing the 
curriculum and to excessive time spent on test preparation (e.g., Koretz, 2009, 2017). 

While federal testing requirements are popular in the abstract (Henderson et al., 2019), 
this support appears fragile. For example, 41% of respondents in 2020 endorsed the view that 
there is “too much emphasis on achievement testing” in public schools, up from 37% in 2008 
and 20% in 1997 (PDK Poll, 2020). And net support for testing requirements falls by 20 points 
when respondents are told that test administration takes on average eight hours per year 
(Henderson et al., 2019). State testing is popular, but that support comes with reservations and 
may be malleable. The pandemic and ESSA testing waivers introduce additional uncertainty 
about the use of state tests, which could have important implications for the viability of testing 
policy. 


ESSA Test Waivers in 2021 


Concerns about testing in a pandemic prompted USDOE to allow states to request 
waivers for some ESSA testing requirements. Specifically, on February 22, 2021, USDOE 
provided formal guidance to states requesting temporary ESSA flexibility in three areas 
(Rosenblum, 2021). First, states could request waivers from ESSA’s accountability and school 
identification requirements, such as requiring the use of test data to differentiate schools. Second, 
some public reporting requirements could be waived, especially those related to test scores. 
Third, USDOE offered flexibility around test administration and recommended considering 
practices such as remote administration and lengthier testing windows. 

Because this waiver request process allows states to ask to jettison components of federal 
test-related policies, it may shed light on which of the three aforementioned purposes of testing, 
for different users of testing data, are most salient to states* and how tests are likely to be used 
going forward. 


Reflections on State Test Waivers: What Are They and What Do They Suggest? 


Table 2 summarizes the initial formal requests that states made for waivers from ESSA’s 
test-related requirements; USDOE approvals are in bold.° In our discussion, we focus on how the 
waiver requests would, if approved, influence the ways test results can be used. But first, it is 
important to emphasize that waiver requests were not as extensive as permissible. In particular, 
we point to what is absent from Table 2: only 12 states are represented in the table, indicating 
that the vast majority did not request any waivers beyond the bare minimum of what USDOE 
guaranteed requesting states.° That the requests for waivers were not more widespread and 


3 State test results, for instance, often take several months to make it into the hands of educators or families, 
impeding the use of testing to help individual students (Marsh et al., 2006). 

4 Of course, the decision to request waivers reflects complex political dynamics. Additionally, we have information 
only on observed waiver requests, which might reflect informal (and unobserved) negotiations with USDOE and 
suppositions by policymakers about what is likely to be approved. 

5 In a few cases states modified requests over time, such as Colorado’s initial request to eliminate science testing, 
which was later modified to administer science tests in grades 8 and 11. 

® Moreover, only 36 states requested the separate but guaranteed (for approval) “Accountability and School-Level 
Identification” waivers, which eliminate requirements that 95% of eligible students be tested and that results be used 
to differentiate schools. 


ambitious may indicate that many policymakers have faith in the value of their annual test 
regimes for at least some purposes. 


A Number of States Are Prioritizing Flexibility Over Statewide or Cross-Year Comparability 


Several requested changes, often approved, provide flexibility to the timing of tests, test 
length, grade and subjects tested, and the test instrument itself. While all these approved changes 
provide districts with flexibility, they sacrifice to some degree statewide or cross-year 
comparability of test results. Most prominently, nine state education agencies asked for districts 
to have flexibility over the specific tests they administer. While this would not make cross- 
district comparisons impossible, they certainly would be more challenging, similar to comparing 
achievement across states utilizing different tests (Kuhfeld et al., 2019). 

The use of local assessments would have no immediate impact on accountability, both 
because accountability provisions are waived this year and because most states, including those 
requesting flexibility around local choice of exam, use test score levels to assess schools rather 
than year-to-year growth measures. But growth-based measures tend to be favored by 
researchers as a means of identifying school quality (Polikoff, 2017), and the local assessment 
option may hamper any momentum toward growth-based accountability in states with this 
waiver (e.g., Fensterwald, 2021). 

The use of local assessments would also hamper policymakers wishing to, for instance, 
figure out which districts are handling COVID-19 recovery efforts well or poorly. Similarly, a 
lack of comparability across districts could hamper the identification of students for interventions 
or specialized (e.g., gifted and talented) programs. 

Testing flexibility has enormous implications for educational research. Test results from 
2021 will necessarily be difficult to use as a baseline (Ho, 2021), but changes to instruments will 
further limit comparisons to tests used in the 2021-22 school year. And if a state does not 
administer a test in a grade or subject (e.g., cancelling science tests or assessing math and ELA 
only in alternating grade levels, as requested by some states), no baseline would be available in 
those cases. Moreover, states did not administer tests in 2020, and more than one “gap” year of 
test score data makes modeling the student growth — often essential for effective research and 
evaluation — highly impracticable (Fazlul et al., 2021). 

Even where common tests are administered statewide, other flexibility granted to states 
may make results difficult to compare across jurisdictions or years. In particular, that the 
requirement 95% of eligible students participate in testing has been waived may result in 
variation in testing across student subgroups, making comparisons across years and between 
groups less reliable. Consequently, uncertainty about student achievement and how it was 
impacted by the pandemic or various interventions is likely to echo across education research, 
policy, and practice for years. 


Waivers Would Make State Test-Score Feedback (Even) Less Useful 


Diagnostic uses of test results (e.g., for remediation) are less likely where states have 
delayed their testing, as USDOE encouraged, which is in turn likely to delay the receipt of 
results. In one extreme case, New Jersey received approval to move the state test into the fall. 
This would prevent districts from using the results as a diagnostic tool to inform summer 
programming, though locally administered tests may still be available. Similarly, results released 


in the fall can be of no help to families in terms of selecting appropriate academic interventions 
for children over the summer. 


Waivers will make it more challenging to get parents 
detailed, actionable information. 


States are also supposed to “provide to each individual parent...information on the level 
of achievement and academic growth of the student...on each of the State academic 
assessments”.’ Even prior to the pandemic, these individual reports to families looked quite 
different from state-to-state and often did not provide much information beyond whether students 
are below or above state standards.* This does not mean that parents do not receive test-based 
information. They may, for instance, receive data from locally administered tests, and/or state 
test results could trigger conversations between teachers and parents in which information is 
conveyed. Nevertheless, waivers will make it more challenging to get parents detailed, 
actionable information. The challenges arise not only because delayed testing windows push 
reporting further out, but also because shorter or alternative tests likely limit the degree to which 
the assessments can be used to let parents know where their students stand in terms of specific 
skills. 


USDOE Draws a Hard Line on Common Assessments 


Excluding requests for waivers of common statewide test requirements, of the 40 total 
specific waiver requests we outline in the cells of Table 2, the Biden administration approved 28 
(70%) at least in part (shown in bold). It is notable, then, that there appears to be one requirement 
that USDOE was particularly unlikely to waive: namely, that schools administer a common 
assessment statewide. Despite many requests by states to rely on LEA-chosen assessments, only 
the District of Columbia (where most LEAs are charter school networks) was able to secure such 
a waiver. Why would the federal government draw a particularly hard line here, while granting 
considerable flexibility in other areas and acknowledging — both implicitly and explicitly — that 
state tests from 2021 are likely to be of limited use (e.g., for accountability)? We speculate that 
there was concern that even temporarily waiving statewide tests would give momentum to those 
advocating for the elimination of testing all together. That is, USDOE (and perhaps states that 
did not request that common assessments be waived) may be less interested in what happens 
with testing this year than worried about a slippery slope toward increasingly lax testing 
requirements. 


General Implications for Testing Policy 


Our discussion of states’ waiver requests is not intended to be exhaustive, but rather to 
reflect on some features of testing policy and what waiver requests might tell us about how test 


7 See Section 1112(e)(1)(B) of: 


https://www.k12.wa.us/sites/default/files/public/assessment/statetesting/pubdocs/WA_Smarter_ HS_samplescorerep 


ort.pdf 
8’ For more about the content and format of reports that go to families, see Goodman and Hambelton (2004). 


results might be used. In light of that discussion, we conclude with two more general 
recommendations for policymakers. 

First, policymakers and advocates of standardized testing should be more explicit in 
linking specific features of testing policy to specific theories of action about who will be using 
the test results and what they will be using the results for. Vague assertions that “we cannot fix 
what we do not measure” may be rhetorically useful, but they provide little rationale for any 
specific testing regime. Testing policies that are not motivated by specific theories of action for 
how their results will be used are likely to generate results that are underutilized if they are used 
at all. A potentially useful example of an unusually clear theory of action can be found in 
Washington state’s waiver request. Although the plan was not approved in full, the waiver 
request explicitly outlined a statewide system of classroom, school, district-based and state 
assessments, each with clear purposes (Washington Office of Superintendent of Public 
Instruction, 2021). 


Policymakers and advocates of standardized testing 
should be more explicit in linking specific features of 
testing policy to specific theories of action about who 
will be using the test results and what they will be 
using the results for. 


Second, we have a warning for those who — like us — believe that standardized tests can 
play a useful role in improving educational outcomes: Maintaining political support for such 
tests — and especially for statewide standardized tests — probably depends on demonstrating their 
diagnostic value to both educators and families, as these uses tend to be viewed most favorably 
(PDK Poll, 2020). Accountability policies are controversial, and research and evaluation are too 
far removed from the lives of most students, families, and educators to inspire deep political 
support for standardized tests. Yet the diagnostic uses that might rally public support by 
providing concrete and immediate benefits to students and schools have often been neglected 
even by staunch advocates of standardized testing. This is evidenced in part by how poorly state 
testing policy is typically designed for diagnostic purposes. For example, even prior to the 
pandemic state test results have often taken too long to receive and provided too little accessible 
and useful information about the knowledge and skills students need to develop to allow teachers 
or families to direct targeted support to children (Goodman & Hambleton, 2004; Marsh et al., 
2006; Mulvenon et al., 2005). It is therefore no surprise that many might be resistant to 
administering tests during a pandemic that has already drastically disrupted instructional time. 


Maintaining political support for such tests — and 
especially for statewide standardized tests — probably 
depends on demonstrating their diagnostic value to 
both educators and families. 


Sketching out a comprehensive plan for diagnostic state tests is beyond the scope of this 
essay, but English Language Proficiency (ELP) testing may provide some useful lessons. As 


shown in Table 2, even states requesting waivers from various ESSA requirements for testing in 
math, ELA, and science typically requested at most minor waivers from ESSA’s ELP 
requirements. On the one hand, this may reflect a belief that such requests would not be granted, 
or the absence of a strong political constituency opposed to ELP testing in the way that teachers’ 
unions and many families oppose subject testing. On the other hand, the lack of a strong anti- 
ELP testing constituency is arguably a sign of political support for ELP testing in its own right. 

Another possibility, then, is that ELP testing is genuinely more popular than other forms 
of statewide standardized testing. That could be because ELP testing has institutionalized 
popular diagnostic uses in a way that other types of statewide standardized testing have not. In 
our experience, stakeholders generally believe that ELP results are at least somewhat credible 
signals of important student skills and can be used to classify students as needing specific kinds 
of productive interventions (viz., English learner services). In other words, because it can 
provide timely, credible information that specific educators are expected to use in specific ways, 
ELP testing policy may have developed robust political and institutional support in a way that 
other aspects of state and federal testing policy have not. 


We encourage policymakers to think carefully, 
explicitly, and publicly about how they have tailored 
their standardized testing policies to achieve various 
diagnostic, research, and accountability objectives. 


The COVID-19 pandemic has heightened the salience of pre-existing concerns that 
administering statewide standardized tests is not worth it, even as it has heightened concerns 
that many students need substantial interventions to address lost learning opportunities and that 
gaps have widened. We encourage policymakers to think carefully, explicitly, and publicly 
about how they have tailored their standardized testing policies to achieve various diagnostic, 
research, and accountability objectives. This will help to ensure that standardized tests have 
benefits for more schools and students and will bolster fragile political support for statewide 
tests. 
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Table 1: Potential Uses of Test Results to Improve Student Outcomes 


Users 
Students & Administrators & 
Purpose Families Teachers Policymakers Researchers 
Diagnostic Private tutoring Differentiated Grade retention N/A 
instruction 
: Identifying 
Research & Evaluation N/A Retlecune “i and Program improvement gaps; 
improving practice Fealustion 
TAPGrT echGel Teacher evaluation; 
Accountability N/A School Improvement N/A 


choices 


Grants 


Table 2: Features and Approval Status of States’ Initial Waiver Requests’ 


English Language F ‘ ; P 
; Test Length and ‘ , Proportion of students {Students Can| Schools Differentiated Using | Results on School Report 
State Common Statewide Test Timing pe (ELP) | Subjects/Grades Tested Tested Within Grade Opt In 2020-21 Data Cards 
Normal Ma EES: No, testing is 
Normal length test ; Grades 3-8 + HS 95% req'd, peal Yes, schools identified for |Yes, test results by subgroup 
ESSA Yes- Common test across the state , , Yes, each Spring ; : he required as | , pais : * 
Baseline in the Spring for Ke12 ELLS eel families can opt out. default intervention in following fall | ESSA sec.1111(h)(1)(C)(ii) 
i x 
b Shoricned EL 95% within grade No, same school 
CA No & Math tests ; ; é : 
reduced by half) waived identification as 2019-20 
Grades 3-8 take either 
Yes- to the 
co ee mS 95% within grade other test No, same school 
waived (e.g... 4" identification as 2019-20 
No science grade ELA) 
pc No- LEAs administer 95% within grade No, same school No- but will report LEA 
interim/benchmark tests waived identification as 2019-20 test data 
ie 95% within grade No, sane school 
DE No- LEAs administer local tests waived identification as 2019-20 
No high school tests Gjncees ; 
GA ___|No-LEBAs administer formative tests 95% within grade No, same school No- but will report LEA test 
No’science waived identification as 2019-20 data 
MI No- LEAs administer benchmark No No scishee 95% within grade No, same school No 
tests waived identification as 2019-20 
pais Shortened sas 
No- LEAs have flexibility to choose : : 95% within grade No, same school 
MT statewide test is an - . ; ; 
the test option for LEAs waived identification as 2019-20 
NJ Start Strong: 
NI shortened test 95% within grade No, same school No- but will report Start 
(45-60 ; : 3 Z Strong results test in the 
: : waived identification as 2019-20 
mins/subject) Fall 
taken in the Fall 
Yes for in-person 
NY No students, online 
students opt-in 
OR No- LEAs allowed to administer Yes if safe & 95% within grade No, same school No 
interim tests families opt-in waived identification as 2019-20 
NO asi acoinisise soma OF 95% within grade No, same school 
SC diagnostic tests in ELA & Math : : é 
f ‘ : waived identification as 2019-20 
(Science still statewide) 
Representative sample of 
i i 0 
No Request to allow sampling of ELA:3 &7 25% of schools (10,000 No- will report state-level 
schools administering the statewide Yes for in-person students/grade) No, same school 
WA ; Math: 5 & 10 results, but not school- or 
test. Other schools encouraged to use students, online Selanee: 8 identification as 2019-20 dis icktevel 
local interim tests students opt-in , 95% within grade 
waived 
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* Empty cells indicate no waiver requested, or the waiver requested no change relative to the baseline. Bold cells indicate that USDOE approved request. 
Waiver requests may include features and other waivers not described here (e.g., extended testing windows; waiver of the 1% cap on participation in the alternate 
test). This table reflects state’s initial waiver requests, some states (like Oregon) have submitted a second request after receiving feedback from USDOE. As of 
6/02/2021, thirty six states & the Bureau of Indian Education (AZ, CA, CO, CT, DE, DC, FL, GA, IL, IN, KY, MA, MD, MI, MN, MS, MT, NC, ND, NE, NJ, 
NM, NV, OH, OK, OR, PA, SC, SD, TX, UT, VA, VT, WA, WI, WV) received “Accountability and School Level Identification Waivers”, which waive school 
differentiation and the 95% of students tested requirements, among other things. An example waiver can be found here. 

> In California’s conversation with USDOE, they described that they plan to administer the state summative tests except in districts where it is “not viable” to do 
so because of the pandemic. USDOE approved this but added, “Please note that viability refers to the ability to administer the statewide summative test given a 
district's specific circumstances in the context of the pandemic. It does not provide an opportunity for States or school districts to choose to administer local tests 
in place of the statewide summative test.” 
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